-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23924][SQL] Add element_at function #21053
Conversation
Test build #89266 has finished for PR 21053 at commit
|
retest this please |
Test build #89284 has finished for PR 21053 at commit
|
cc @ueshin |
Test build #89313 has finished for PR 21053 at commit
|
retest this please |
Test build #89321 has finished for PR 21053 at commit
|
@@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: Expression) | |||
|
|||
override def prettyName: String = "array_contains" | |||
} | |||
|
|||
/** | |||
* Returns the value of index `right` in Array `left` or key right` in Map `left`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can improve this doc. There is also broken quote.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, thanks
Test build #89336 has finished for PR 21053 at commit
|
Test build #89356 has finished for PR 21053 at commit
|
@@ -413,6 +413,78 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { | |||
) | |||
} | |||
|
|||
test("element at function") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need so many test cases here? this is just to verify the api works end to end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also the function is element_at, not "element at" ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you correct the description saying element_at
instead of array_position
?
* We need to do type checking here as `key` expression maybe unresolved. | ||
*/ | ||
case class GetMapValue(child: Expression, key: Expression) extends GetMapValueUtil | ||
with ExtractValue with NullIntolerant { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe extends ...
should be this line.
python/pyspark/sql/functions.py
Outdated
@@ -1846,6 +1846,27 @@ def array_contains(col, value): | |||
return Column(sc._jvm.functions.array_contains(_to_java_column(col), value)) | |||
|
|||
|
|||
@since(2.4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to annotate as @ignore_unicode_prefix
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks
| ${ev.value} = ${CodeGenerator.getValue(eval1, dataType, index)}; | ||
| } | ||
|} | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stripMargin
is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, thanks
|if ($eval1.isNullAt($index)) { | ||
| ${ev.isNull} = true; | ||
|} else | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stripMargin
is missing?
) | ||
} | ||
|
||
override def nullable: Boolean = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid it's wrong because this returns null
when the given index is "out of bounds" (array.numElements() < math.abs(index)
) for array type or the given key doesn't exist for map type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I see. Invalid right
can cause null result too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, may depend on right
value, too.
@ExpressionDescription( | ||
usage = """ | ||
_FUNC_(array, index) - Returns element of array at given index. If index < 0, accesses elements | ||
from the last to the first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Returns NULL if the index exceeds the length of the array
.
} else { | ||
array.numElements() + index | ||
} | ||
if (array.isNullAt(idx)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
if(left.dataType.asInstanceOf[ArrayType].containsNull && array.isNullAt(idx)) {
...
}
Test build #89383 has finished for PR 21053 at commit
|
Test build #89403 has finished for PR 21053 at commit
|
@ueshin would it be possible to review this again? |
Test build #89476 has finished for PR 21053 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for some nits.
python/pyspark/sql/functions.py
Outdated
returns value for the given key in value if col is map. | ||
|
||
:param col: name of column containing array or map | ||
:param value: value to check for in array or key to check for in map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a note or something to notice the index is 1-based.
@ExpressionDescription( | ||
usage = """ | ||
_FUNC_(array, index) - Returns element of array at given index. If index < 0, accesses elements | ||
from the last to the first. Returns NULL if the index exceeds the length of the array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
python/pyspark/sql/functions.py
Outdated
@@ -1846,6 +1846,28 @@ def array_contains(col, value): | |||
return Column(sc._jvm.functions.array_contains(_to_java_column(col), value)) | |||
|
|||
|
|||
@ignore_unicode_prefix | |||
@since(2.4) | |||
def element_at(col, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name value
is confusing because this is for an index or a key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm, the 2nd argument acts as index
for an array or key
for a map. This is why I used value
.
Can I use index
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about extraction
which is from Column.apply(extraction)
? cc @gatorsmile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done
override def dataType: DataType = left.dataType match { | ||
case _: ArrayType => left.dataType.asInstanceOf[ArrayType].elementType | ||
case _: MapType => left.dataType.asInstanceOf[MapType].valueType | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: how about:
override def dataType: DataType = left.dataType match {
case ArrayType(elementType, _) => elementType
case MapType(_, valueType, _) => valueType
}
Test build #89502 has finished for PR 21053 at commit
|
Test build #89520 has finished for PR 21053 at commit
|
retest this please |
Test build #89528 has finished for PR 21053 at commit
|
Test build #89545 has finished for PR 21053 at commit
|
Test build #89553 has finished for PR 21053 at commit
|
Test build #89554 has finished for PR 21053 at commit
|
Test build #89552 has finished for PR 21053 at commit
|
retest this please |
Test build #89558 has finished for PR 21053 at commit
|
Thanks! merging to master. |
thank you very much for your comments |
What changes were proposed in this pull request?
The PR adds the SQL function
element_at
. The behavior of the function is based on Presto's one.This function returns element of array at given index in value if column is array, or returns value for the given key in value if column is map.
How was this patch tested?
Added UTs